


Institutional Archive of the Naval Postgraduate School 





Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1973-09 


Applications of cluster analysis to some 
problems of interest to the U.S. Department of State 


Lamping, James Richard 


Monterey, California. Naval Postgraduate School 
http://ndl.handle.net/10945/16489 


This publication is a work of the U.S. Government as defined in Title 17, United 
States Code, Section 101. Copyright protection is not available for this work in the 
United States. 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 
(8 DUDLEY research materials and institutional publications created by the NPS community. 
«ist sae Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS'‘s first 


INN KNOX appointed — and published -- scholarly author. 

| LIBRARY Dudley Knox Library / Naval Postgraduate School 

411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


APPLICATIONS OF CLUSTER ANALYSIS 
TO SOME PROBLEMS OF INTEREST 
TO THE U. S. DEPARTMENT OF STATE 


James Richard Lamping 





NAVAL F UATE SGHOOL 


fornia 


STGRAD 
y, ba 





iS 


APPLICATIONS OF CLUSTER ANALYSIS 
tos sO PROBLEMS OF INTEREST 
feels U. S. DEPARTMENT OF STATE 


by 
James Richard Lamping 
Thesis Adviscr : Ba OnUDerU 


September 1973 


Approved for public release; distribution unkimired., 


7155233 





ppl ieee seorsmeomec luster Analysis 
to Some Problems of Interest 
to the U. S. Department of State 


by 


James Richard Lamping 
Lieutenant Commander, United States Navy 
B.S., University of Notre Dame, 1964 


Submitted in partial fulfiliment of the 
requirements for the degree of 


Mast eERwOr SCIENCE PN OPERATIONS RESEARCH 
from the 


NAVAL POSTGRADUATE SCHOOL 
september 1973 





LIBRARY 


NAVAL POSTC ADUATE 
ADUA SCHOOL 
MONTEREY , CALIF. 939409 


ABSTRACT 


This thesis asserts that Cluster Analysis, or Numerical 
Taxonomy, has many potential applications in the field of 
international relations. It demonstrates two representative 
applications. Both examples treat the nations of the world 
as objects having measurable attributes, and both examples 
use selected attributes to produce a dendrogram (or 
hierarchical classification) of the nations of the world. 

In one example this dendrogram is used to objectively group 
the nations into blocs based on external economic ties. In 
the other example the dendrogram is used to highlight inter- 
aecLOns "amen e eve attributes, ignoring the identity of 
individual nations, the same way a scatter plot highlights 


interactions between two variables. 





TABLE OF CONTENTS 


10 BACKGROUND ----------------------------------------- 7 
A. CLUSTER ANALYSIS DEFINED ----------------------- 7 
B. PREVIOUS APPLICATIONS OF CLUSTER ANALYSIS ------ 8 
oT. PROEO pow eter TTONS OF CLUSTER ANALYSIS IN THE 
STATE DEPARTMENT ----------------~------------------ 9 
A ee Eee ION: “TO HIGHLIGHT INTERACTIONS 
uP Yer 2029 Ce = ee ae Se ee ae 9 
1. General Description ----------------------- = £o 
2. Scenario for Demonstration ----------------- 10 
3. Choice of Data ----------------------------- dah 
4, Choice of Dissimilarity Coefficient -------- 2 
5. Choice of Algorithm ------~----------------- Alle 
6. From Dendrogram to Cluster Profiles -------- 18 
B. ereOuD APPLICATION: TO CLASSIFY COUNTRIES 
hth ———— ee DI 
l. General Description ------------------------ en 
2. Scenario for Demonstration ----------------- 28 
3. Choice of Data ------------------~--~--~----- 30 
4, Choice of Dissimilarity Coefficient -------- 34 
5.:- Choice of Algorithm ------------------------ 38 
6, Explanation of Dendrogram ----------------~- 40 
7. Work in Progress --------------------------- 42 
COCO LO ea ieee ee eee 46 
TRIO A Dt, eee eee ee eer 47 
TEs Loc: a a a SS 52 





LIST OF REFERENCES -------~------------------------------- 


INITIAL DISTRIBUTION LIST -------------------------------- 


FORM DD 1473 


—— = ae ee SS eee eee ee eee ee ee ge SE ee eee ee ees he ee ee ee es ee es ee es ee ee ee ee ee ee ee ees es ee ee ee ee ee ee ee 





Joi 


rr. 


IV. 


LIST OF TABLES 


DATA INPUT TO HIGHLIGHTING DEMONSTRATION 


CLUSTER Sere EEe So OUIPUT BFROM HIGHLIGHTING 
DEMONSTRATION --------------------------- 


CLUSTER PROFILES OUTPUT FROM HIGHLIGHTING 
PHO) nO GRAPHED ——-~--—-—.-.._._.-— 


RESULTS OF HIGHLIGHTING DEMONSTRATION: 
PROBABLE INTERACTIONS DEDUCED AT 
CRUE nk = § —-o- ~~... .-...-~.- 


DATA INPUT TO OBJECTIVE CLASSIFICATION 
DEMONSTRATION --------------------------- 





i. 


2. 


LIST OF DRAWINGS 


DENDROGRAM IN HIGHLIGHTING DEMONSTRATION ----------- 


DENDROGRAM OUTPUT FROM OBJECTIVE CLASSIFICATION 
DEMONSTRATION -------------------------------------- 





IT. BACKGROUND 


A. CLUSTER ANALYSIS DEFINED 
The subject of this thesis is a group of mathematical 

techniques known collectively as either Cluster Analysis or 

Numerical Taxonomy. The terms are Seinieenee Their formal 

definition, paraphrased from Ref. 6, is actually a sequence 

of definitions, as follows 

classification system - a set of subsets of a set of 

objects which conveys some 
information about the objects 


taxomemy -— the science of constructing classificatory 
systems 


Cluster Analysis or ORME oren “Late ene - the science 
Geeconstructing mathematical classificatory 
systems 

in less tcmual terms, Cluster Analysis includes all mathe- 
matical methods of classifying objects into sets so as to 
represent complex data in a simpler way which will serve as 
a fruitful source of hypothesis. | 
/ Cluster Analysis ene WO stage process. The first stage 
is to choose quantifiable attributes that describe the objects, 
and then use these attributes to measure the pair-wise dissim- 
ilarity among the objects. The second stage is to represent 
these dissimilarities by an appropriate classificatory system 
or display. 

The input to Cluster Analysis is normally an nxm matrix 


of data, measurements of m attributes for each of n objects. 





The output from Cluster Analysis is normally one of three 
eersp lays : 


A hierarchical classification, commonly called a tree 
diagram or dendrogram; 


A partition of the objects into mutually exclusive sets, 
each set described by a "profile" or vector of m average 
attribute values; 


A "clumping" of the objects into sets that may overlap, each 
set again described by a profile. 


/ The value of these outputs is that they summarize the original 
data objectively and they tend to highlight subtle interactions 
ijeechne OfYetmarmea@aca, Enabling a user to formulate reasonable 


hypotheses about these interactions. 


Ba PREVIOUS@aEetaCATIONS OF CLUSTER ANALYSIS 

Cluster Analysis was developed in the eighteenth century 
by botanists and biologists attempting to inject more objec- 
tivity into their classifications of plant and animal specimens 
(the familiar phylum-genus-species scheme). Subsequently the 
same technique was used by geologists. Most recently, Cluster 
Analysis Haoeeounad numerous applications in the social sciences, 
particularly in psychology. Reference 1] describes an applica- 


tion that is representative. 





ig, PROPOSE DABeE eh niOmsS OF CLUSTER ANALYSSS 
Vetoes UE PARTMENT 

The United States State Department is currently trying 
to revitalize its policy-making and resource-allocating 
functions, a la the Defense Department metamorphosis under 
Robert McNamara. This revitalization effort has been under- 
way for eight years now. In that time there has been published 
a plethora WEeeSnecs 2 and 3 are representative) of "master 
plans" for the incorporation of Systems Analysis in the State 
Department. 

This paper does not propose anotheremaster plan, but 
merely suggests that a single existing statistical analysis 
technique has useful applications within the State Department. 
The existing technique is Cluster Analysis, and the potential 
applications within the State Department are described and 


demonstrated in the pages that follow. 


A. FIRST APPLICATION: TO HIGHLIGHT INTERACTIONS OF VARIABLES 
lL. General Teepe ion 
iim! | cavion, Cluster Analysis highlights the 
Tntebactioneror several variables the same way a scatter plot 
WOulG sOr twesvariabples,. It inputs an nmxm matrix of data 
(n countries, each PEeer bed by m variables) and outputs a 
dendrogram. The dendrogram itself says nothing about inter- 7 


actions among the variables. But it is a simple matter to 


select a clustering level (where k = the number of clusters) 











aad phot the distmaibugion @f the m variables within.each 
cluster. Comparisons among these plots should bring out all 
significant interactions among the variables. In particular, 
it should highlight mutual interaction among three variables 
or even among four variables just as easily as it highlights 
a two-way interaction. This is a potential not shared by 
factor analysis and regression techniques. 
2. Scenario for Demonstration 

The United States Constitution lists Freedom of the 
Press, Freedom of Speech, and Freedom of Assembly as inalien- 
able human rights. Although one might argue that these 
precise terms have been eclipsed by communications technology, 
most of the Western world would agree that "free and facile 
communication among the people" is an essential quality in a 
free and productive society. Having tentatively accepted 
mms vests ,Mausociologist or political scientist might well 
wish to dissect the concept of "free and facile communication 
among the people," to define it in quantifiable terms. More- 
over, a policy planner in the State Department might well wish 
to go.one step further: to use this quantitative definition 
in a comparative study of the countries of the world. Such 
comparisons are made every day with respect to Gross National 
Product, Life Expectancy, etc. Why not also tabulate a FFCAP 
(free and facile communication among the people) Index? 

Assuming that the State Department considered it 
worthwhile to develop such an index, they would probably task 


a team of their sociologists to propose a list of measurable 


LO 





factors that either contribute to or detract from "free and 
facile communication among the people." This team would 
GCemtainly “apprecware the pmactical advantages of building 
this list around statistics that had already been measured, 
and using the existing data to continually validate their 
theories against the real world. 

Cams they would probably be faced, early on in their 
proceedings, with a large volume of existing data to be 
perused, or analyzed in a very general sense. At this point 
they could profit greatly from applying Cluster Analysis to 
haeenlignt emesanveraction of variables. 

3. “Ghedicemot eDatra 

To demonstrate this application, the author has usurped 
the role of S¢ave Department sociologist and selected the 
following statistics as "measurable factors that either contri- ) 
bute to or detract from free and facile communication among 
the people": 

Variable 1. Concentration of Population in Cities, 1965 
Variable 2. Radios per 1000 Population, 1965 


Variable 3. Students in Higher Education (Third Level) 
per One Million Population, 1965 


Variable 4, Ethno-Linguistic Fractionalization 
Variable 5. Press Freedom Index, 1965 
See Appendix A for definitions of these variables. 
It is readily admitted that this list is not as 
complete as it should be. In particular, "Literacy Rate" is 


conspicuous by its absence, and some measure of newspaper 


ara 





circulation seems a necessary counterpart to Variable e. 

The reason for such omissions was unavailability of data, 

an affliction that is widespread among independent researchers 
But not shared by insiders at the State Department. 

The unavailability of data to this researcher imposed 
Memmi d war bimededtaienyson this demonstration besides the omission 
of some desirable variables. Table I displays values of the 
aforementioned variables for only 85 of the 136 nations in the 
world. It was necessary to delete the other nations because 
of exceSsive missing data. 

4. Choice of Dissimilarity Coefficient 

This section describes the process of eee wows 
data in Table I to a matrix of Dissimilarity Coefficients. 

The first decision point was to specify a formula for 
the Dissimilarity Coefficient (DC). The DC is a single real 
numecre seecanying the amount of dissimilarity between Country 
A and Country B, obtained by somehow combining the five data 
points describing each country. There are many different 
formulas for transforming these ten data points to a single 
DC. Cormack presents a concise but comprehensive summary of 
all the common formulas in Table 1 of Ref. 4. 

In the situation at hand it was decided to use a = 


buctideatmUmseamace, standardized by range. That is, 


> 
DOKGEa)) Wy(Xyy —Xyy)° 
vel 
where Wo = + 
max(X,.,— X5,,) 


1,3 


2 





DATA INPUT TO HIGHEIGHIING DEMONSTRATION 


TABLE I 


UIMDU e@ 


OZ OL we 
Tu WW 
= (-) +L 

(Ama | SS Te ‘al 
Ip LU 
= F772E 4. 


NICO AN O00 Om FON OLIN AD Ue NUMON OO MM OP OD HOD HOM O HODONMD 

Ott NONDN SE NOM ADOMMOMNQAOOF LT NN AMNANOMAr-NO-, OD 

eecoevoevceo eer eeoeeevreeeeweeeeee eee Fee ee eee ele ell lt 

NAL) att QIN QI OUINI AQIS Nt OI N MON AO Seat O NN St QAO Seat NO MOAI MS 
| ol a i tif 


DMM MOD DOMOMESTOAN FOMDOMON ANNA AMAIA ST NW OW COM 
oeoetrerveeeereeeoewrteeteeeeeteeetesveereeeeeteerteeeeeet @ @ 


OOOO CD COVCVVDOCOCOCOC OD VCOCVOOCWODO COCO COSO COO COCO O0O0000 


DODO DOCVDVVVDOOC0CO CON COVCVVOVDOOVD COV OOOO OCO0COC0O0O°O 
@®*eseee%e#eeeeee#sexest%eeeeeegeees®* eee ~e ¢ 6 ¢ ¢@¢¢ @¢#¢ 2s © &# @e@ © @¢ © @ @ @ 
OODOCODOOCVOVVWDOVOVVOMWOODCVDOOCVVVOOV DOV VDOCVOMOIOVOO0O 
ODAONICUSE FT! AOA MDWNHOMDAAWNAILT MS MN DORM AOS DOMODO PAM 
PTINO DA NOIAMPAIDVODHIDONOUOSFNANMOVYDOVODDO TNO NOlM SNS 
COO SFO) ILO VAIS Rt COLD OO FE AQ OOM EMO FOOMNOMORVHAHNOOOAY 
Ne -4 —t at = ct ot 


UIDROONEOMMANODOMDOMOMWIN OWNS AMAO NOOO OMmMDO DOW 
@®eeeess¢6¢e?08e@e¢06080e8e6e8e¢e?@80e¢6@¢6¢680e8 828 86@680é¢06C<;hmUCUcr MU OmMlUc OhUcMOmhC<C HhCUrh/}/;.hUhhUCM RHhCrh/!hlUC MC h!hUOUhrmhlUchH lh //FChlUhHmhCUr RhCUrh/} hUO 
MODOMNNHN OD DOMNMNNODNKOOVNAIHOMO FH OOAOFMHREOMNDOOO FHn 
AID FDO DMM OLD PD Fat DA ALD) et ONE OE HPO OPINOMSINEN MON 
NUN ttt IAIN ANN NOON Ht ORIN AN SERINE MO MON 
—t 


PFPHUPOADMNOAST OHO DOI MAMM ON SF OMNFINW OO HIKY MOMMN SO 
NFO AC NID IO tO 0 mt FON OO ON St dt I SS St HO) SILA HOC NOD 
OOOO O DOS ODO COOCOONDVDOOVVO VSO OOO V GOO OOO O00000 


OOS OQ OOO OCOO0O0 COO OC OOOO OOO CCOOOOOOOCOOoOOoOOoO 


NOONONSFOAHON OWN ONOWN OO HOM OMMNMNO MNO NIANOANOMWMWN OU 
NFM ADO MO FFU O00 OO Het NQAN AMINO DO SHAM SUVA DO OM Coo 
ot et tt A tt IOI NNN AN UNA AIANMOO MH MOM MOMMA MMMM Mmm 


Le 
— 
=I 
oa) 
~ = a @ 
WY a. Oo — 
Ly uJ an) > > v a 
bk er Oo wn 3 az < eS 
<I O<xt Za (_) =z << > <{ _— 
an =< SCR <I <I -™~ <I 2 =. © _— Pal 
Fe) fT qed lw Zz x“ «a J JO a > << D> 
OO Saver we os i > ae ee > o Ae <0 SO oo ee < © af mat © 
At wHO4 NDO WH KFPOZYCDWW OOWOODORMAO 4SBIUWINKeR eZ Z> 
wo BZFOUdMEqENO weowZDWIWeeQnzD ZOqgorKeZ2”nOVdaAZuawWid 


Rett MR OW TONS WO ITO ZR eS ee ede OO UT OWYLOdMe OF 
me K NIZDIC STI ODHWe SII ONNnNIN ZW oowe ws >Zwe 
ZAADOWYOOUWCUCOLTYCAZewWWwet Fa OwgqODIDNEBE II ee DDONMSO 
DPUVUOZDPWOOPSWA DDODDDHZOUMNNA zwagq lod >-VOOCOMnNwnz! 


eS 








CONT INUED 
VARIABLE 2 


? 


TABLE 2 
VARIABLE Il 


VARIABLE 4 VARIABLE 5 


VARTABLE 3 


CODE 


NAME 


AMAA PFAIANEOMONMONEOUOEMODNODNNOVAANSONDOTFONMNMNODOMS 

ODDO OS SEM AOE ORMINODOON ODUW MMO HAM MAO FF DOM et atePUOvo FN 
® e@«2eeee#e@ee@eeeeeteeteee @eee28t 8® ® @ © @® @ @® © @# @ © @ © @ @ @9 @ @ @ @ @ 

NAMOKNONOFOM HHO tet OR tet HNN SHO et HMO OND O00 HO HOOAINONN 
1 r ft | ttt ite ' r 4 { itd | 


DODOOUMAOAMOT OPFOR TMNODDOUIASMNE DODO NOME TROOOMS OM 
SFAIM™ OD OAID MOM NDE AIM OMA OF NOE RUNS OTR OO DOOYOO 
OM Oh DODWDMDMO ONOOW Fa NMOVNHOAMOaAMOOD DOF TUONO EEE OM 


e®°eenFf8ef ® @@mhmhUcOOmhUC OUCMOCUCUhMHOChUCUChMOOCmhUCUC OrmhUC PCMC(Ch CUhOHTHhUchOCmhUchOO FP HH HOhUhUchOmhUC OMUCUchOOFHrmUCUhMCUCUCOmhMUC OmhUCUchOChUchOrmhUCc OrmhUCcOrmUCUCcOmhUCc OmUCUCcMOMhUCcCOmhUChlrhOmhUlUlh OFhUC OrhUC OUCUCrOhlhO 


0 CODD ODDVOO 999090090 OO9O0000 DO0O90000 DOOO9DO0 DOO 00 00°80 


OOO OS0 OOSO ODO S OO OOD OD ODOOSO OS OOS OOO OVO SOOO OO0O 00000 


eeeeetekeeee#eeege @#ee#ee8%® @8e#08 @8 @® ® ¢ @ @ @# @@ 8 @@hUchOhUC WMCc OChUhH Hh Hh HOhUh/O Uc HlhUh Ohl Ohl 
ODVDO VOVDOVDVDOOVOVOV OO VWOOD OOO DWDOD DOVOAOANODVWVOQWOO0DOOO 
AD NOOK VONO MOHAN ODWO HN OM OF TANUONA™ COTRINDO OF OVANDO 
OM WOO aA OFORDPRMOTVOVONNDOTOODNONON OMY OWO 
O “ md ALOU OC mS NOUN IOC ett et 
com . a wd et rt 


WV COLVO OO OW SF OO ONOWUWN OO OVD OM NO SHOMOUAEWNOS HON MN OM NNO 
e®ee#8fhmUmUMOHmUcOUC OC OHOlUC OlUC UC MLC UC OCC OmhUC OC FC TC OOCUCUCcUhOOmUCchOOCOFHrhUCcCOrmhUC COmhUC OrhUC PCC hOChUCUlcrOhmhlUlcCOmUlUchOhmhUCcCOmhC OTCUh/}OChUc OH FP CO fF CC FC FC 8 CO 
MOOMNODOPOFPNNOUIAT AMEE ODS DODOMAN HD DANOAPRAMSTOAM-NM 
CTV LON ee ON me NI et HI SLOVO OM FLAN OD HHO OOH KIMNANOS OS CIS 
) mated M Madan ei WN mal NIN 


MOOSE ON Note Pat HHOR-OMOLO MOMDD ANH O OW HY OAS SON OM SIN OS 
MNOOOOD DODOONAONVNAEMUOLNMAAAWNOOMAINOOODOOCOaAYAGDOON0 
OO OOD DODO OOO DOO VDOOOO00O90 DOD VDODOV DOOD VGVVVOO0C0O0OHO 


OOD DODO DODOOSOO OODD ODDO DOOON OOD DDOV DOOD VOVDOO9VOC0C0O 


OMNANANMOHOOANMOON VDOONANOMCOOMAOOONOOANX-OOCOO 
DO CODES & DOO HAM UVUIVIA OO Smet FE INLN OOOO BRESLIN 00 ARH NEN OTN 
OF PF SFT UIMNMLDALALN LAL 0 OO OOD 00D ODO UORE ER REERER ODO DD ODONDA 


OO 
— 
= 
mM 
= 
OQ. 
Lu 
ad =. 
<I < 
<y ‘S) faa) < a= 6 zwyn eB 
_— — <I <I Ww ane 
4 cm Ce b= ~ i 2zaaqe 
Gr a —a<% << (tb <{ WY a) a OD eect te 
¥Y¥~I=> Oct mime 4H <f OI <f = — LY <{ Ze PHA HIT 
mw << Om TF ZFAGNH OR - OC OZIZ Ke <0 MO ti<gt Ww 


aad 
BOYMICY OFSNOKMWEOCOCNWN wW wetZ2aqudtaqdtlreandoygod Gre 2an 
WW 2ZWWOSZ &~NS ODO Tr OW RZ YY OR SH TORI TZ Eke qe SR OME A IOF 
2ZOmD2zORISMISSLTTOID COSA MIN OK COM DAO Y A KITE ODIF ONS 
WWoO Ca "LOwdteaArTqoOO UDC DeZ2rewOoOnuvrtaodtzadwraiatoadtrZzow 
ONIDOOZODSMREUWUNCENE LRH HDNUDQH DIOR NDA DOF UUINEAaIZ 


14 








Euclidean Distance was preferred to others simply 
Dewause of its geometric, intuitive appeal. | 

On the other hand, a more elaborate rationale went 
imcvo the decision to standardize by range. First of all it 
was decided that some type of standardizing (scaling) would 
Besoppropriate. Most of the literature argues convincingly 
taeas scabing is inappropriate when the difference in scale 
between two variables may be intrinsic; but no such intrinsic 
Gifferences seemed likely in the five variables used here. 
Moreover, using unstandardized Euclidean Distance in this 
SeEvaulon Wemmascteaely result in the DC being driven by 
Variables 2 and 3 while Variables 1 and 4 would be eee : 
memored., sand sere is no\a priori reasoneto intentionally 
emphasize one variable over another in this application, 

Havimiguaecided to use some type of scaling, there 
were many types to choose from, namely eecmee bye Gandand 
deviation, scaling by range, and scaling by some other 
heterogeneity measure (see page 326 of Ref. 4 for a comparative 
discussion). Since the data distributions were mixed there 
Wee no compelling theoretical reason for choosing one scaling 
method over another. Eventually, scaling by range was 
selected for its Pa cle y, It would be interesting to see 
if scaling by standard deviation would significantly change 
the end result (dendrogram) from that obtained here; but this 
was not done. 

Using Euclidean Distance standardized by range, the 
425 data points in Table I were transformed into a matrix of 


3570 Dissimilarity Coefficients. 


i 


Ps FS 6— Sat > 


a aa 
_—- BIE» aaa aap 4a 


-_—=—_ «= 





o> 











> eeChoace or” Algomithm 
There are several methods of proceeding from the matrix 
Of Dissimilarity Coefficients (DC) to a partition (or a dendro- 
oram Of partitions) of the countries. Choosing one was the 
_next decision point in this demonstration. All methods in 
general use fall into one of three categories: 


a. Agglomerative Algorithms ~ a series of successive 
fusions of the 85 countries into groups. 


b. Divisive Algorithms - a partitioning of the complete set 
of countries successively into finer partitions. 


Co” "Realveearvrve nl foritnhms -— successive reallocation of 
individual countries between the sets of some 
iINnitedeepartic ion. 
It was first decided that a reallocative algorithm would be 
Mmapproprraveswceause Tt Yequires an initYTal partition, “and 
there was no a priori evidence to suggest what that partition 
should be. Between the two remaining alternatives, theoretical 
considerations did not yield a preference: in nearly every 
case, agglomerative and divisive algorithms produce identical 
dendrograms. The agglomerative algorithm was selected because Oa 
its details have been more thoroughly documented in the 
literature. 
Within the family of agglomerative algorithms there 
are at least eight documented alternative "sorting strategies" 
or formulas for determining the DC between cluster (k) and 
cluster (ij), using the DC between cluster (k) and cluster (i) 
and the DC between cluster (k) and cluster (j). If the matrix 
of Dissimilarity Coefficients contains natural and compelling 


clusters, each having strong internal cohesion and strong 


16 





external isolation, then the choice of sorting strategy is 

mer a crit@ical one, buc mt natural and@compelling clusters 

ome WOU present, different sorting strategies can produce 
markedly different dendrograms. The eight common sorting 
strategies are explained in Chapter 3 of Ref. 4. Of those 
emehnt, the Complete Linkage-Furthest Neighbor sorting strategy 
ana the Single Linkage-Nearest Neighbor sorting strategy 
represent the extremes. The others may be thought of as 
compromises between these two. The Complete Linkage-Furthest © 


Neighbor sorting strategy can be expressed mathematically as 
Deas ) =mmax( DC(k,i), BCCkK,3) ) 


ine produces@eempact clusters having high internal cohesion; 
but it may sacrifice external isolation when natural and 

compelling clusters are not intrinsic in the data. At the 
other extreme, the Single Linkage-Nearest Neighbor sorting 


strategy can be expressed mathematically as 
Deane) —“emin( DCWe,1), DCCK,3) ) 


It tends to produce wee of Membr eers Sinwadd 1 F1onetoysor 
imebead of, compact clusters ,sespecialiy wren svat ural and 
conpe ineweivsterseare not intrinsic in the data. In some 
applications this tendency is desirable, 

For the demonstration at hand compact clusters were 
considered he snenecwdesirabie thanechains, and the Complete - 


Linkage-Furthest Neighbor sorting strategy was selected. It 


would be interesting to see if one of the compromise sorting 


iy 








“i 
, i 1 
meet Hite 


awe 














a ol 
a 





strategies, such as Group Average, would Significantly change 
the end result (dendrogram) from that obtained here; but this 
was not done. 

The dendrogram in Drawing 1 was obtained using the 
Complete Linkage-Furthest Neighbor sorting strategy in an 
agglomerative algorithm. The computer program is listed at 
moe=end of this thesis for information. It should be noted 
that the matrix of Dissimilarity eesenietenee were standardized 
to the (0.0, 100.0) interval, using scaling by range, before 
they were input to the clustering algorithm. Bue Suck 
standardizing was made for computational convenience only. 
Its single effect was a monotonic transformation of the 
numerical scale across the top of the dendrogram. The shape 
of the dendrogram was unaffected. 

6. From Dendrogram to Cluster Profiles 

Having obtained the dendrogram in Drawing 1, and 
recalling that the purpose here was to highlight the inter- 
actions among variables, it remained only to select a level 


of clustering, identify the partition of countries there, 


plot the distributions of variables within each cluster, and . 
compare these plots. But several iterations of this process 
were required before the interactions among variables began 
to appear. 

The first attempt was at level k = 3. Here the - 


United States appeared alone in one cluster, and the other 
two clusters contained 41 countries and 43 countries 


respectively. Apparently the United States was alone because 


1.18 








ro 
oO 
‘eS 

4 
= 
ra 
ra 

-A 


Dendrogram in Highlighting 


Demonstration 


| 
Tn AMA 


NOVYVOOOY NOV OWN =SUINNNOOT Da IID OO DOUMNEID OOM MOO AYN OO NDOD WOW OID OD SH SONNEI VOODOO N11 OMt 
NTION™ THON MAMIFHAINFNFOIFYNODONYD 0OON OT SOM DU tO OF PUA RSE DS et PDE ORO OWNE ID DYNO NOMI am Mama 
DPR OM etn N BM mt INI EMI QIN OE NNO MMOD OUR NN OO UM ENUM VON OMIM TAMARA NOR OM PUNO OOP OTT Tine 


Oo 

-_ 

= 

ire) 

= = 

(Va) oO a 

Ww o uw 
- WW Ss] Oada 
a w Z z Zz wo 
-_ £an Ean) Qexu 
Wo wt w ADQOda< 
awd %~ deaorv>e 


OdaicwO> OM’ 


| a Ge td 


Dusen OD 
WwI—K-ASMIIwdIS VPenAFTId zg end yw’ 7FVu 7 si 


a 

Ww qd <a 

— FA 

ao wi, z= 
= Pg 


sm os 


> 
Nl 
Od 
v2 
aa 
wl 


TETNAM 


Oo xy~da> 


ARAB REPUBLIC 


LCVAKIA 


ad <2 
dt +, cHequae 


ERMANY 


~ 


ROK ZXY— FQ YBOO— Y Waren O 


HTwAIiNy ad 2uzaaddarixcge 
eee FCI DMO Mr NON a ore at dt le ECE RMR OU Ie dr he OS OKT De Ge Oem ZN > DAF OOD AA teem UZ Iw 


Zz 
Oo 
cman 9 
zen 
=Jee 
a 
wd 


< 
2 
_— 
a <q 
= £6 a a<4 (e) Rt 
zg o=d =) (= C) caw 
mc Zz WN MUL —ade-ul 


wed YORAC Ide +awold Eu 


ANISTAN 


z 
vu 
w 


a 


OLTA 


a 
>a 
ad 
LOW 


SEN SMT MMA DL I? DO VAMC SUV SSI POR Sew OMNIS Zo UD WOV*| AQ SP VOUT ZIf2 UA eer dasdweudqwVuv27datl/ 
ot ToOwwWI4 27>} Jw £0 CVU Orwe II awa New D2 FWD OTIDVO!Z o> DIODINuIdaA OD Z "Ua WA eDItu Tala erudaroarnu: 
PUG SAE PDS DOMOMNWO DU > SAO a EZ EZFUMORK NMQYVOWO DANITY UVAIMN Ne DREN UE RDO MGM OD 


Wg, 


100.0 


2040 40.0 SAL. 80.0 


O20 


Scale 





of its extreme values in variables 2 and 3. At this point 

the United States was evaluated as an outlier and was not + 
included in subsequent comparisons. Within each of the other 
two clusters, the mean and standard deviation of each of the 
five variables were computed. After a short perusal of these 
20 statistics it became apparent that no interactions among 
variables.were highlighted at this level. Within four of the 
five variables, the two means were displaced from each other 

by less than the sum of their standard deviations. 

Ponsune seeond iteration, level k = 7 was chosen: 
Again the pair-wise displacements between means were compared 
to standard deviations. The standard deviations were definitely 
smaller here than they had been at the k = 3 level: within 
cluster homogeneity had improved. Between cluster hetero- 
geneity had improved to a lesser extent: many of the means 
were well separated but several others were not. 

For the third iteration, level k = 8 was chosen. Here 
mere were three clusters comvtaining only one country each. 
All three were dismissed as outliers, leaving five clusters 
ior furthers, . Within each of these five clusters, the 
mean and standard deviation of each of the five variables 
were computed, using a straightforward computer program. 

These statistics are listed in Table II and displayed graphi- 
cally in Table III. After a relatively brief perusal of 
Table Tre several possible interactions among the variables 


came to mind. Then a quick double-check of Table II confirmed 


20 





that four of those possible interactions were probable 
interactions. These probable interactions are listed in 
Maple IV. 

None of these "probable interactions" was verified 
mathematically. The first one would have been relatively 
easy to check ouv, by computing 10 correlation coefficients. 
But the others would have required considerably more ingenuity. 
Since the purpose here was to demonstrate a new application of 
eluster analysis rather than to deduce substantive results, 
mathematical verification was considered beyond imac BaCcOpe ‘Of: 
this thesis. 

However, there was a further step, within the scope 
of this thesis, that might have been pursued but was not. It 
would have been logical to proceed next to another cluster 
level (perhaps k = 13) and again look for interactions. Such 
reiverations might well confirm or refine the interactions 


already deduced, and nighlight additional interactions as well. 


B. @M@COND APPLICATION: TO CLASSIFY COUNTRIES OBJECTIVELY 
1. Gene@al Description 

in thas application, the user presumes to understand 
the variables used, and the interactions among these variables, 
at least on a superficial level. The purpose here is not to 
researc menen cesaomes, but rather to objectively classify the 
countries. The previous application (to highlight interactions 
among varvanles) produced a dendrogram only as an intermediate 
step before producing "cluster profiles" as the final product. . 


But the current application seeks only the dendrogram itself, 


ell 





TABLE II CLUSTER PROFILES OUTPUT FROM HIGHLIGHTING DEMONSTRATION 


AWW 2. 
mOL 

CIO 
QJ pe RL 
>YNIoA 


Zu e 


0.505 


28400.0 


L233 


0.024 


UNITED STATES 


CLUSTER WAS SINGULAR 


coo mst 
h OLN 


NAIA 


INIA OM 
Lf mt 
Lemon 


OOOO 


CO@O@@ 
e@ @ ¢ @ 
Oooo 
aa KnO 
LYST UNO 
ONS ete 
vant coed pnt NJ 


OONIATO 
eoeee 
OOO 
aga 
Mm AINA 


TUV SS 
TOoOovo 
OO sO 


OoO0°O 


Oooo 
NPON 
co OV ON 


WY OO 
a 
z<ac 
rt bed 
QO. Jf 
TO <q 
Ona NS 
I Ih 
lll ooh VK 
ITN u 
OAdtZ 


qe 
CONO 


cor” 
wen 


shee: 


CLUSTER PROFILE: AV 


2 


OO wt Bh OOO OOO 
POO MN OaA™ MOLINO 


a et ON ANNAN ATO 


LALA OLV& SO Om 
OUD AIS 6 DM LN © 
MAT OM OO MOLALLA OLA 
eeeeset ee @ @ 


OODOOO0000O 


ODOOH4HQ00000 
@*eese#s# se @ @ @ 
QOVr-OOO0O00 
NOE LINO QR OV COUNT 
“ADO OMA AIO OW 
MAIO fF etaus now 


OVNOAMHAOMN 
eee @@4¢ @ 80 @ @ @ 
MONO DMOLANI AC 
OM ODA Of ath 
amt mA A et ed et UN 


FOOOUMAOSL OM 
SAOMTOATMATMO HY 
OOVODOO0O000 


clolelelelelelelele 


OODMO NOW AY) 
MF HOWDM OM FHA 
OP NM) wet mt et IQ 


KINGDOM 


mY EEX TDR 
SOL mars OO DO I 
WoOdZzreowows 
zee DOwaAmaMOwWM 


PROFILE 


CLUSTER 





CONTINUED 
VARIABLE 2 


TABEe Ils 
VARIABLE 1 


VARTABLE 4 VARIABLE 5 


VARIABLE 3 


CODE 


NAME 


ON O SUN OO Dat me ODP DWAIOUINA FOAM ON MUA LAN 
= PANVAIN OOO aaItMAtAOMPFOTOrDOOSE) 
eoeeee#@e#@@e@e¢@ @6m—Uch8mUcMOMmUCc OhUCOOrhC OrhUCcOrmhUCMMhC—C< MC MhUC MHThLCUhOhUchhmhUCUcMOmhUClUhOrhUCUC OmhUClUc Oh 
=—ONNat ASAIN Nae SS Oeste AQ MA ONAN OO © 

itd 


~~ DORAN ORMNVIOOM-HADNNOAAMAOOr- 
MH DOEEMNOLNSAMOVRAONMNG SIAM SS 
OOAOO OOTAO SAH OAMNAHO HOO HOO HHO 
eeee%ee68@mUmch3CormUCc OrmhUCcCOrmhUClhOrmhUCc OrmhUCcCOmrhUCc OrmhUCc OrhUC HhUC MhMhC<“ HMC ROHrmhUC MHhUCUCUCOhUCUCc HmhC Hh h/}hUhO 


SOO9O0000 2®O O00 O0000 000 0000000 


OD VVDOOOVCC OO VO OVO VOOSOV O00 00 

eeeeeee%#ee8 @6[3OmhUCcCrmhUCUchOmhUlUhOvrmhUchCmUch HC HO HH FH FH FH FH Oe 

OOOO OOOO OQIOOOOGCOOOOO00O0 

NOt PD FORM D HAD QFANO ODNAW) SOW COON 

DUANHKOVUDMNOUALS OF OTPomMmnsfNOwvowo 

SKUSE WULIAR OW OOOO THaeNOWNOO Oxted 
pad nl poet ed pod i 


OaSNONMH DMM HOM DOOWMOWMD TH DNIVS Os 
oeeeeeeeeeggeee¢e¢eee?8@8¢@¢80606h6hc386 8 @ @ © OOO 
ONDWNOANOMON OONOWMNM OM BON FMM OO 
PFOMNSHAHDOAADSFON AAROIAPAMAMOOM 

raat ed NaN AIN Bae NO AIN AISA CAM — 


NOUN N FOLIA FON ORMOMNMTOMNSEMOHO 
UV AI ANIA at ef LANE OM LAO MI MIA AI AIR MOAI MOE OH 
@e@eeoeoo eo OOOCOOO0OO000000O 


OOO OOVVDOOO OO OOO COO DOO OCO000° 


NNNOOPLTNAMMNNOVDOOOVDOOMNNNOOUOMFm 


PFPORHOFANDDOWMNOO OND S HINOM DO NeHo 


~ Se MANN AMORKNOMAN MMM) 0O DO 
WO 

y= 

= 

dq) 

= 

G = 
Lu > <I 
fad < Ma 4 Uo at Pa 
Lu © < Oat bk 
Fs a WO <a <I pa ae , uu 
Todt ~~ VL = Ie — 
UOr>=s wv wod a ht ~waiUu Oo xY+d> 


= wIDY DAm WORWY YCWOYFZZYmn 2 
ZISAS HSH IendYyWOoOZzZZOwzUu aAaqwdtwnid 
rt = NOON IL he UD Sct KO LOE ee OO 
SD SINT ZWYsWUOODQWaArNnYZw4ZZwDe 
QOOYTOCORrKWCDTCUYeNndaWwwOo-2zwDOO 
OQONW OD UOH SHS QUOUWMILSOZSEZUNORND 


eo 





CLUSTER PROFILE 


389. > E100.0 0.198 2.61 


0.210 


165 


URUGUAY 


CLUSTER WAS SINGULAR 





CONTINUED 
VARIABLE 2 


TABIAE TE, 
VARIABLE lL 


‘VARIABLE 4 VARIABLE 5 


VARIABLE 3 


CODE 


NAME 


NOT NH ODDO MHO-MODNO DO 
ONIM DVO WM ae P RM NAA SAOM 
® @ee¢eeeeeee®seeee @ @ @ 
MOAAL MN AM AAU MAY ata 
etre Peer lt ey 4 


OR OTMOMNDWDOONOWEON 
MANS ONUASAONWOMOMO 
DOO OOO NOAONANSYE NLS 


OOVOOOVOVVOODV0O0 C000 


OOOO VOD VO0C 00900000 

®@® @eeeeeee#eeee#ee?e*%@¢e @ 

OOOQV OOOO OWOS9D90O000 

OVODTEMNYHOVWHAO Orr 

OVOP DOWDNNAN DOW A 

PFPYONDOMNAMNOO arm 
c= = 


DAMNONDDINVDONODOO 
@eeeee?#eeeeeeees#eée es 
OVUDTFORPHROOMAKRTO 
CO 0 Ret ASE OL ORES SP 
mT re AN ANA R tet) 


AOWDOMNANOMAM MNOS 
MNANMOS OAN HaAAONS 
ODOVDOVCVOOOO00000 


OODOVVDOVCODOCO900000 


ONOANDNOOMNOMNN HOW 
PFORDWNMOHNAHXOY OO Aaa s+ 
NIAID MO OME AIM O09) 0 CON O 


ARAB REPUBLIC 


LOVAKIA 


wi <I <{ 

I - ae Inae 

OOO KM OMHOnD 
meee AA < ae LOZ 
Te Te TOO ZR OTOL MO 
ONION Sey Iau sd<{ 
PD | Pca OS | aD Bh Bil] CED |S ey a Im ee a 
Own DIN TOaAOMYOTOWY= 


> 
=< 
<I 
= 
cc 
Ww 


24 


O> 
Sy 
<qJOQ 


Cee wee ROFILE 


320.0 1674C.0 0.666 —3.03 


0.003 


365 


SOVIET UNION 
CLUSTER WAS 


SINGULAR 





CONT INUED 
VARIABLE 2 


TABLE II, 
VARIABLE 1 


VARTABLE 4 VARIABLE 5 


VARTABLE 3 


CODE 


NAME 


DOW OMD DOOSTTVNONODMOMNNDAOO 
eeoeoesvpeeeveeeeteeeeevreee ee © @ © @ 
ODOC Ord etetHOHtHtHOOOO0O HAONN MM 

;yRtertrreberdl 


PFPOHADVOM MOOWM SFr 0 OFINSFOOON OOF OH 
WV COO DAUR OY ath FO OO NDF OMNODAIMH AN 
MO 0 COMO DOO FINI Sh DOO HOD DOQOPF 


eoese@#eeeeee#%e##?e¢eeet* ses © @¢ ¢ ¢ ¢ @ * @ 


ODODDOOMNOTO VOODOO OV VO VMOOVORAOOO00O 


ODO DVDOWWDOWOVN VOODOO QBDVWOVVDVDOO90900O 


® «© @ &@©.h6UhemhUC<C NMC NMSMC~<‘<C HOC OlCULPHhUCUc HMhUCr HhUC HMCUrhMhUC COmhUC HTMLCrh/PMhUC~C MHHMhUC<C rh HhC—<C Mh hUC(CU hOHhUC(C MHhUCUh!HhmhUhHhM—CUh/}S 
OVUDVOO OVO VO DVUGD VOI OOVOVWVWDVOVWVOIOO 
OPO OWA OPM OMODONW = OMAN OM Od 
POs AQUMOAMQHONENMNABODMON MON ath 
WAN an] od wd ml md 


DOODN AWS OF FF NWO DONT DON OOWD @ 
e®*eeeeeteeee¢#eeeeeee#eee*e6e¢#8 ¢ 8 6 @ 
MHO DOW Nt OAIN GOO FOAM Mm MUA Ot 
UD) et RON mt EO et et RELA NLA OANA ate 
r= pad 


tet AOE NBO MTOM MONORNOMaRD 
ADO ONVNONRMOARNODDOOONOFMOOOON 
ODO OOOO DODD ODDVO OOD ODVO9O900O00O 


OOO ODOOD ODO MO OOOO OQMDO OVO 090000O 


WNOWN OOO KHON NOONMO OO OONAMAQM 


SPUN ORQOMNNMHNODMNNORWM CAO BAR OMMM 


MO B= PULA LAU UNO BLN OR FUN COM OD OPM OST S 


BG 

oO z < 
< a < a 
— [ag a j& al 
> Tih I <d Oa) 2 Get 
IT wg ei) ZINN = CE Sri 
J recta DIY NOD wedteu 2 © a << 
NIeOdIde>awoodzsnz a XL wow 


Vee. 2222 Jee ea Ia vYOMOUSIOL Ss 
AZ ™“ODOwWAToDITOwWrarazvyudadqrarw 
PMZORFOYCNE DY TOOL F AHH AIOOUDWN 


a 


ee 086 


sDEVS 


CLUSTER PR@EI LE 


HO 
e 
Cn 


CHO? 


CO 


2125+ 


428+ 





ee Lit 
CLUSTER PROFILES, OUTPUT FROM HIGHLIGHTING DEMONSTRATION, GRAPHED 


Variable 1. Concen. of Population in Cities 
min=0,0 05 e LO e15 0.21=max 


ees), 


went 5 


Variable 2. Radios per 1000 Population 
min=5.) 200 100 600 800 1000 1233.5=max 


Variable 3. Students in Higher Ed. per Mil. Pop. 
min=6.0 8000 16000 - 21000 28),00=max 


we en ee | 
PEO 


Variable ). Ethno-Linguistic Fraction. 
min=0.0 2 ot BO 8 0.926=max 


Variable 5. Press Freedom Index 
min=-3 e Sl ~2 0 2 5) ° 06=max 


wee) 


a nn, 
eS 


26 





ye 


TABLE IV 


RESULTS OF HIGHLIGHTING DEMONSTRATION: 
PROBABLE INTERACTIONS AMONG VARIABLES 
DEDUCED AT CLUSTER LEVEL k = 8 


ima Paere appear £0 besno sisenificant woair-wise comme hataons 


(either positive or negative) among’the five variables. 


2, A high value in Variable 3 tends to be accompanied by a 


high value in Variable 5. But the inverse and converse 


are not true (i.e., a high value in Variable 5 does not 


jmply a hagn value in Variable 3, and™a low Value in 


Variable 3 does not imply a low value in Variable 5). 


3. <A very high value in Variable 4 tends to be accompanied 


Pease loweavaluemin Variables Jy cc,sand 3. 


4, The combination of high value in Variable 1 and a low 


value in Variable 4 tends to be accompanied by a high 


value in Variable 5. 


For ready reference, the variable names are: 


Variables l. 


Variable 


Variable 


Variable 


Variable 


Concentration of Population in Cities, 1965 
Radios per 1000 Population, 1965 


Students in Higher Education (Third Level) 
per One Million Population, 1965 


PeaniOom=binmeniastic KFractionali zation 


Press Freedom Index, 1965 


27 





COmerNecettvelyecO)t@em. or perhaps modity, the user's previous, 
Subjective, classifications. 
Co eed Owl or Denorstration 

People commonly think about the countries of the world 
as members of clusters. They use labels like "the Western 
World", "the Communist Bloc", "the Have's" and "the Have-not's" 
every day, and they frequently hear mane esoteric terms like 
"tri-polar world", "five-polar world" and "spheres of influence", 
all of which have classificatory overtones. 

No doubt such classifications are convenient and useful; 
but as they exist now, many are also subjective and confusing. 
When two speakers discuss the behavior of "the Communist Bloc" 
without first enumerating the members of that bloc, they may 
disagree violently until they discover that one of them includes 
Cuba and Chile in his definition but exludes Yugoslavia, while 
Pie me lie me come tne neverse. If our classifications of ee 
countries are useful but subjective, it would seem desirable 
to make them more objective. : 

Imagine wtm@at a political scientist in the State 
Department wished to inject some objectivity into the terms 
"Western World", "Communist Bloc", "Soviet Bloc", etc. His 
first step would probably be to identify the several theories 
(form of government, internal economic system, external 
political ties, external economic ties, etc.) that are commonly 
used to define the terms in question. Then he would probably 
select one of these theories for quantification and search out 
measurable factors (preferably statistics that had already 


been measured) with which to express it. 


28 





Hor cxample, imagine that he selectedethe theory that 
external economic ties are the prime mover in the concept of 
bloc membership. Then his search for measurable factors would 
certainly lead to statistics such as level of foreign aid 
received from every other country, value of imports received 
eon every Ovles COUMULry, ald wale “Or exports sent to every 
@vyner country, each of these statistics prorated against the 
host Country's GNP and/or population. 

The final step for our State Department researcher 
would be to combine these measurable factors mathematically 
so as to output a bloc membership label for each country of 
the world; that is, he would write a "factors-to-bloc trans- 
formation". If he were not acquainted with Cluster Analysis, 


Peony Welweury = LO Wrave a singile=wrunction of the form 


bloc membership = F(aiduas aiducops Aldiys oss 


: trade trade trade trade 


US? USSR? JAPAN? 


GNP, population, etc.) 


where "bloc membership" is a discrete variable which can take 
on three or perhaps five predetermined values. But such a 
function would probably be crippled by two weaknesses: exces- 
Sive complexity and theoretical inadequacy. The reader can 
certainly visualize how complicated such a function would have 
to be in order to have broad applicability. Moreover, no 

Mac ler homecomplex the function, it would necessarily Lenore 


an obvious fact about blocs of countries: two countries can 


eo 


COM. MKT. ? 





Be cleeely bound in a bloc not by economic dependency On eacn 
Genecreeuy DY Paelresimutvaneous economic dependency on an 
intermediate country. 

Cluster Analysis has far more potential as a "factors- 
to-bloc" transformation. It does not share the dual weaknesses 
Ge the functional transformation. First of all, the ability 
wer Sooo UWOmecolunurlcS  L@couner tThrousa an intermediary is 
intrinsic to every clustering algorithm (so long as the 
Complete Link-Furthest Neighbor sorting strategy is not used). 
Amd secondly, Cluster Analysis requires that the user define 
only a transformation from measurable factors to a pair-wise 
Pessimilarity Coefficient rather than a transformation from 
measurable factors to bloc membership. Surely the former 
pimeouliemoe lees comolex tham the latter. 

3. Chorsee of Bava 

TOM@emonstrate this application, the author again 
usurped the role of State Department political scientist and 
selected the following statistics as measurable factors with 


wich to objectively elagsity countries into blocs: 7 


Variable 1. Gross National Product per Capita, 1965 


Variable 2. Trade as percentage of Gross National Product, 
1965 


Variable 3. Soviet Aid’ per Capita, 1954 - 1965 


Variable 4. U.S. Economic Aid per Capita, 1958 - 1965 


Values of these variables for each of 85 countries are listed 


in Table V. 


30 





TABLE V - DATA INPUT TO OBJECTIVE CLASSIFICATION DEMONSTRATION 


Tr <4 
i 
WI 
al OL 
m< < 
<= (© 
— @ 
om Ale a 
I et 
>oOa 


om 


WO- 
ae OL 
O<{ <f 
I WO 
mm Of 

mma 
aWNwW 
>a. 


Jo¢ 
MW <t 
dar 
— b—4 
Coo. 
aiza 
>OUO 


COUNT RY 
CODE 


NNW OMOVOWNMADAN LFOD BKM MO oO OVW WW © 
DOMNANANOMN MOS HOM tAA4oOONAaaMOSINOOTNOOKOODHODONOT 


eooeroeoeeoereeer eee eevee eer eeoer ee wr eee wre ee eee oe e © 
ee ee ee en ee OO ON NO OOMomM 
MAN OMOOA aM Da Net FPm 


— Ng NOD FO VAN DMO 
DOOD VOD VDVVOVOS DOO SH VDOOVVODVDOOO SF FORNKO-ADOM+F+OOO0O 


OOD OCDVDVO DVD OAD ODOM ODDO DDO DOO T DOKIMNOWMNANKNOMs4OOD0O 
H oad ~ OH 


MOR MO DH OLTTRIANM AES ORO EOMNANIN ONO DAHINOODOMNDAHAHMU 
oeeeeee#eees#te#e?eeeeeesee* @¢e%* @ @ @¢@ © @¢ ¢@¢ © ¢ © @ @ ¢ © © @ @ @ 
FAM MAAR MOAN OID NDON SF AHONOMNODAEINAG HR TS MENDONRA 
CO) LN ONT a SE EL CL) tO mt ANI ES BS OE SEER AIM MEAN OO OI 


UVO at OO AUN tO OR NONI SH HBLAM DOOD SF ODRODDDDAHMAMMMOOMHO+ 
oeeoeeeet%eee8«t@fe#e#t8fee @&h6m/?)hhUchmhUcMhOmhUCUcOmhUCUcMOrmhUCUCcCOmrhUCUCcCOmhUC OMhClUhH HLhUhM]H}HMhUCUCOC OChUh$H HhUhH FH FC F&F 6 GF 6 BO F 
TAN COLD OS ON SLI OLA OS NO OB SE OS NANO OM OM OPMOHEMm OOH IM AOO 
PM DO AM HO WHO ODD OMAR ONM OOD DOM ODD ODIO ANAM WN st YO 
UNS ON EON EA DO AION NALA LAN VLAD AM LAS DNDN OW HOO OO COM MOE UN 
MN a eet HA ae nt et rf eet CN et 


NOONONA FSF OHFBONN OWWO NOW OO SBONONWUWMN OM OMNWNANONMNOMWOW 
NP EHAQDOOMM SND O DOMAIN MOND DOAN) EIN O OM OO 
ret eet ed ed SI ed a OIC NAINA AIO FO OOM MOA MMA MMA MAM 


) 
fe = 
=! 
qu 
= = <1 
Y) Qo © 4 
uJ iu Q >> < pal 
raed x f& © 3) Ss eae <I © 
<I Oat < Te) iad Ic > <I rd 
a = CO. 2a Im Zz <a = Oo _ = 
VY) I eect ae Sl a sel | > <I 2 
OO SOettc twee OS we Aww A>N act mt O 
Ot SOY QDS Se eDOSZYDWW YDQOVOHNYCO wIWWNCeH rR Sop 
WO Z2ZORMDTENO KR SwWZDWtIRMONZD Zamtrtrer2nodaZ2wagwet 


be < Tetet)) e OLU T N WO  O ee e OO SSAOW COA Ot 
mM ZDNSEK NI ZOK TIO DRMW eR Tien KNW IN Sw ToOOoua aa om ee 
<_DOUWVYIOUWOWYCOLA YK SAW FO.OWTODIONE JD YeyDDOMmsO 
POOUO ZW OQO>WAMADMODDIDwsSONLMNWMO BZWOtTtDTOmMWTrKrOoUNtNUNS 


eal 





CONTINUED 


TABLE © - 


VARTABLE 2 VARIABLE 3 VARTABLE 4 


CODE VARIABLE I 


NAME 


OM OAIRE TM MAIO atm & OM OW S$ NOU AS OF FUINOVDOANSTODMOONOFDODAA 
eoeceeveeereevteeeeereetreeeeeeeeeeeree eee eee ee ee we eo ee 
OftAaslrTNaANMSTMNOOA OPM MO SF PALO DP fF OLS FONNDO ANNO MM OO 
N N) eat QD ed siege =, UNO AN NFO et 
CJ = 


a OF) NO eT Qory OO CON Oar ™O ) 
ODOWWNO OR ONODDOOMMOMPNAMOMOODODMOOOADNODOOFT DOOONODO 


Ce er er er er er er 
ONO SHO ONS OPOODDOOWN OST ONT DODVIONDOONGD ONOMAOOOMOO 
3 aot AION ag) 


OD AIDS FUNDA DO HOO ANS OW COLN OP CO O>lM WO OW 4 OF OP eA OT eM ODA TT 

e@ee@eeeeee8ee#ee¢eee¢ee¢e#@¢e¢e¢e¢e@0@e0¢6¢ 80° @ @ © @¢ @® @ @ @ @® @ @ @ @ @ @ @@ @ 

HOS OM mS PFO NUD DONA MIMO DO ARNO AHO MO OWSARDST OPPIDOSNOSD 

LN SAN CY SEO ONS SFQUORLOIAMMOMMSOMMOAM MAY MNS BAARFMONHAOM aA 
ad 


NICO DD mt a AIO RL I OM DD HAF MOM HOMO O NH ODO HOO KN AISNE UE DOAN OD 
eeeeee@#eetee#ee#8e%8?e0e%0 @@ @#e@¢0 @ @0@8808e80eet8#ee @.6UchOhrmUCUCchOrhCcOhmhUCcrhC PMC MC MC hUCUOhUchhUCChrhMhUcrhlUrOhlUrFhUlhlUmh 
OF AWN RSI © SALAM De OLD et IAN et CO AI OLD 0} COND SS MO RSF OLN WO OUND OV SOD 
OIE QNUD COQY COE CO DB FSM AF SION ASIN OMA AMIN NO ONODOOMSFN QOS OMNDO 
eae A eS NIN OeaNAINQININQIROQUEON SF SNS O Rt RIt Reet ee  OAM SH OM 
N na Aji 


OMA NAM OmMOOHAIM OO OW DO OOM ANOM ODO ODMAOO ONOO ANE OODOOO 
DOA OOO AMD AWD OO SAHOO FFUVIN OO OO mat FINISH BOO SA HOUEPINON 
CAPE PAE PIAL NAA ID OD DD DD DODO ORRERREBRREDODDDMODDODO 


O 
— 
= 
oO 
= 
a. 
Ww 
a = 
< <I 
<I O foe) z <I zw Q 
bh ba <I <[ Lu _ 
- od or = a Ww 2zaiaa 
Oma It at w <I a to = Od Mle ete 
¥v~jI> Oc Him me fOodag Zz 4 , < Fe PHAN 
wt Ow —f Zadnwn ORnH & O OFIZ Z KF 2a0 WMO Wet Ww 


LOCMIANLK OMAMOnwELTOCM WwW weszdwadaqrZzqanagosyo Ls Z2an 
WW ZWWOZ YN ADO TE OW RM ZY ORM IOdMLZ Se Ie HOME TOF 

ZZOAAMEZOAASAZLT OID AV ZACH aN LOH DAOY YK AZTOIIMH OM = 
WiWO PI COW de ATIQOOIUD eC De Ze~woOww raodedowreaaodLZzow 
ANDOUZODSMEWNEENS TRH RS DOLD HORNA MOURFUUIMNZAMAIZ 


Bi 





Here again the unavailability of data made the 
demonstration artificial. As was asserted in the preceding 
SeetehkO ms, ee of countries by external economic ties 
should depend primarily on pair-wise data. But this researcher 
did not have access to any standardized, comprehensive pair- 
wise data. Variables 3 and 4 above are pair-wise but not 
comprehensive. Foreign aid is provided in substantial amounts 
mec OUunuUrIeS Obheresthan the United States and the Soviet Union. 
BUC this mesearcher could not locate any but the most piece- 
meal data on other donors. Variable 2 above is not pair-wise 
at all. Pair-wise trade data is collected by the Interna- 
tional Monetary Fund, and their data is both standardized 
and reasonably comprehensive. But that data is not made 
availabie to the public in comprehensive form. Without 
pair-wise trade data it is virtually impossible to construct 
aelopical=tneors for blocking countries by external economic 
ties. Nevertheless, this demonstration was carried through 


to completion because its purpose is not to deduce substantive 


latter completion of the research described here, the 
author did obtain access to the IMF data and began a Cluster 
Analysis on it. But the results were not obtained in time 
MemincorpOieate them in this thesis. See Section II.B./7 for 
a description of the work in progress. The data was obtained 
Eomouch stiicmenter—-University Consortaumefor Political Research, 
On computer tape. The reason why the data is not generally 
available was obvious: its sheer magnitude. For purposes of 
Giva cOlmmecuuem, cher Imr defines cU7 countries, and 207 
COUMEL eS eGakenstwoeat atime produce 215321 trading combina- 
tions. The: complete data file contains almost 500,000 numbers. 


5 





Peswles but tO Gemonstrate a procedure. And that procedure 
can still be demonstrated using the foreign aid data (Variables 
3 and 4) which is pair-wise, although incomplete. 
4, Choice of Dissimilarity Coefficient 

whis section deseribes the process of converting the 
Seca in Gaole VY te ammatrrix of Dissimilarity Coefficients. 

When Cluster Analysis was used to highlight the inter- 
actions of variables (Section II.A.4 above), the choice of 
DC was motivated by a desire to have all five variables 
weighted equally, to prevent the user's preconceptions from 
eereculne@ une meisults. Precisely the opposite is true here. 
Here the author presumed that he already knew how the 
variables interact. He wanted to incorporate that knowledge 
into the DC. The DC was constructed using the following 
rationale: 

First of all, it was decided that the DC between a 
foreign aid donor and any other country should be inversely 
related to the level of that foreign aid. Thus, for a first 


cut, the formulas 


i 


1 + sovaiad 


i 


I ¥ usaid, and DGGS OV...) fis 


DeCUS i) ~ 
a 

were considered. Next it was observed that 27 of the 85 
countries received foreign aid from both the United States 
and the Soviet Union. To incorporate relative dependency 
into the formulas, it was decided to insert a ratio of aid 


Jeveis. fence the folhewime formulas were considered. 


34 





1 + sovaid, 1 + usaid, 


Bemus, ) and DO SOV) i _ a 
1 + sovaid 


l + usaid, “ 


These formulas seemed reasonable except that the same level 
enone ne) amiaomoaere mei mpactwon a rich country than it 
Gees On a POorweommtry. So it was decided to insert GNP, as 
a scaling factor wherever an aid term appeared in either 
formula. But this insertion tended to greatly reduce the 
size of the aid terms with respect to the "1" terms. There- 


fore the "1" terms were arbitrarily reduced to "0.1", 


producing 
Sovaid, | usaid, 
+ GNP, Onii+ GNP; 
peCUS,1) = usaid, and DOGSOV <1 > = — sovaid, 
0.1 +—_——— Ol Gal ae CNP 
GNP, au 


The fact that DC(US,i) and DC(SOV,i) are reciprocals and the 
fact that they are dimensionless had intuitive appeal. The 
only apparent Says arene tee were the two imposed by unavaila- 
bility OTetaea. ald from other countries is ignored, and 
pair-wise trade is ignored. Although a total trade figure 
was available, there seemed to be no logical way to substitute 
it for the missing pair-wise figure. 

At this point it was verified that the DC(US,i) 
formula would apply to every country dyad in which the United 
States is a member, except for the United States - Soviet Union 


Gece. woiebocl ye the DCCSOV 1) formulla applies to every 


319. 





COUNT iIgw Gyac in whach the Sovier Union is a member, except 
for the United States - Soviet Union dyad. Thus it remained 
Tormcommeruct Pormulas for the United States - Soviet Union 
dyad and for all dyads in which neither the United States nor 
the Soviet Union is a member. Hopefully the same formula 
would apply to both. | 
But here the lack of pair-wise trade data was really 
crappling. The onlly patr=wiise economic ties of any signi- 
ficance involved pair-wise trade. The only logical formula 
necessarily involved the inverse of pair-wise trade. There 
seemed no natural way to use the available data on total 
rade. Fimally, in d@speration it@was’ ratiPonalized *thatra 
country whose foreign trade is large with respect to its GNP 
tends to have closer ties with another country in the same 
situation. It was decided that the nucleus of the formula 


should be 
DeGy) O7)| trade... — trade, | 


But because of the weak theory here as compared ta the 

rigorous formulas Nie BOGUS Sf) and BDECSOV /ieeert was decided 
to diminish the effect of trade difference when foreign aid 
recipients are involved. Hence it was decided to expand the 


formula to 











sovaid, tusaid, + sovaid, +usaid, 
|trade, = trade, | 4+ ______4 ____4 


DOC1L 53) = rT 


36 





Mie -aeervl of tneorevical 1oundation here is admitted. It 
casts suspicion on the values in the DC matrix and on the 
acmcrogw~am finally opvdined. “puc the reader is again reminded 
that the purpose here is to demonstrate the procedure, not to 
Cea@uce SubStanvuive results. 

Turning from the substance to the procedure, there is 
a= significant departure from normal Cluster Analysis procedure, 
taken above, that warrants explanation. Two completely differ- 
ele ve 1 Ormulas havewpecn developed, “one to be used when the 
United States or Soviet Union is a dyad member and another 
to be used the rest of the time. The mathematical significance 
ee unis dUaliuy 1s unav the DO rormulas, taken collectively, 


produce gross violations of the metric inequality, which is 
DG + DC(b.c) > DECaxc) ieou@ueren! Lely ey ae 


Generally, it is desirable although not essential that a matrix 
or DC'S Sa@umocry the metric inequality. When they do not, the 
clustering algorithm can be expected to produce high 
"distortion" between the matrix of DC's and the dendrogram. 
(Loosely defined, WH stortion* is the difference between 

woe. }) and the level at which COUNtryY meanaecountry j cluster 
together in an agglomerative algorithm.) But of what signi- 
fLeance toe mnen arstvorvron?’ The word carries derogatory 
COMNOvaerrems, DUL 15 Gilsvortion really undesirable in Clwster 
Paolivsrs@. wars author maintarns that it depends on the purpose 
of the clysvering. In the “highlighting of variables™ appli- 


cation, distortion was not desirable: figuratively speaking, 


ane 





each country had been plotted in five-dimensional space and 

the clustering algorithm was searching for natural clusters, 

as plotted. But in this “objective classifying of countries" 

Soest iOn, Alswortton 1S Natural: the original pair-wise 

Similarities specified in the DC matrix cannot be expected 

to be representable in Euclidean space, and during the 

clustering it is desired that these original similarities be 

affected by intermediate countries. With this reasoning, it 

is asserted that violation of the metric inequality is neces- 

sary snd Sues the use of two or more DC formulas is acceptable. 
Using the formulas developed above, the 340 data points 

in Table V were transformed to a matrix of Dissimilarity 

Coefficients. ‘ 


aa Moy 
5.  Chetcesor Milporitenm Va 


\ C7 /71C =m 


Here, as in the highlighting demonstration, the first 
decision point was to choose among the agglomerative, divisive 
and reallocative algorithms. Again the divisive algorithms 
were discarded pecavse they are not as well documented as 
their agglomerative counterparts. The reallocative algorithms 
dag mot apply because they require that the DC be a metric. 
Hence the agglomerative algorithm was selected. 

The final decision was to select a sorting strategy. 
The Complete Linkage-Furthest Neighbor strategy was eliminated 
from consideration here; it does not permit any eha ian, 
which is desirable in this application. On the other end of 
the spectrum, the Single Linkage-Nearest Neighbor sorting 


strategy maximizes chaining, often to the extent that natural 


38 





Objective Classification 


Dendrogram Output from 
Demonstration 


Drawing 2 





NAMOWS OMSK UI CIQVOUN GY OWI OOOQV0 OPN SU SBM SUN WNIOO UI SINGIN TON UNO NWO SOMONE ACI NOOOVUONVUOOVOORMWOW 
OTA MHA N™ UO ODD N ON NEN OND SAD AD DOM AB NDT VU DDNWARE ISI PUMA QUIN OOO FORE ODA TT PSST eNIN 


SH DSIM OK ONT OODDWOT ey NNO TANENN DDE EDT POANUIM Omeame: HOMER SNC NAMM TP REMAN BRAM OA WEE DUO DM 
UW 
_ 
ad Vv 
x -_ 
= a 
a 4) 
ve) aq > = 
Ww a _ a | ea | = 
ur ~~ <q Ww > oO z <a 
_ az " oO Ow Ww <a a ax aa <a ww UZ Zz 
a aq ve) 2 ze 7) > Qa Ww <a _ a2 —<_ - 
= xyes < a ar av Y a LV xr z = = — +e atfra as 
wv aa -— > 4 oe:  - Ge | 2a4 =) met oO < aad oO > ~ FZDM— a ad 
aud weatdazy Oxr vy swt WM Awe-dd> 2 w >xe ¥~ UF — we a> aa qa > de 
O22 e2nr O0O-—ad, Y~A27weru wee Dad 2FeiadfoerY D>-DV—w O OQese Ouoa — ATNOCKR RK ZL I> — 


Wwe SPIN ws EFZeuStat Dans AYTOUA ee eet DO Dr~d Ded ZANDT FPLLFDZOV CHKFgey watroOwovrZzwaizewadql a>a 
ROP eM UR Deedee I EOF KZ Rr LID SU Zw Oe a TU IM OI ees DE mE QUE QR ee IONE Zor OU 
M2 MIS SU OHOV PMs XT Pw TC KOIAV RTS HM SEM IMS ODNYLAY NIL ADITDI<suMmiaqraqe1LAD Ky IP VO HOY Css 
ZONA FYOKDuFZwTMDICC >INSAF CwWwy Dwwr tn vOwa C@OIDDIwWOUs+awCoM@aqrCorva@grqre saAtIDIIIZwarlsu eae DI“2o X*-daD 
DOR KTUEFR ODES 7 BeNOR NNO L294“ 8 SEX ILO DVI Od LOSS PO? WV IM ONIN Weg QDUUH FY D2 AAS DRONA D 


4.0 8.0 ie Gao ere 


C20 


scale 





clusters are obscured. For this application it was deter- 
mined to use one of the compromise sorting strategies. Among 
these, Group Average sorting seemed to correspond with the 
eoncepts of bloc membership, ratios of foreign aid, etc. It 
is expressed mathematically as 


1G Nn. 
ct) ei) nee, )) 


emmy Ses 

The dendrogram in Drawing 2 was obtained using the 
Group Average sorting strategy in an agglomerative algorithm. 
But once again the reader is cautioned that the results are 
Suropec v . 

6. Explanation of Dendrogram 

Despite the admitted artificiality of the results 
obtained here, the layman might appreciate an explanation of 
the information available in any dendrogram produced by 
Cluster Analysis. ‘ 

The key to reading a dendrogram is the concept of 
"cluster level." OP ee specifying a cluster level, the 
following information can be read from the dendrogram: the 
number of clusters and the countries contained in each cluster. 
That is, there is a correspondence from cluster level to a 
ParvulvelOnewer = tne COUNTries. 

The scale at the bottom of Drawing 2 is a cluster 
tewelwscale. Note that the minimum value of cluster level is 


0.0 at the far left and the maximum value is 20.0 at the far 


40 





right. A low cluster level specifies a partition having many 
small elusters, while a high cluster level specifies a parti- 
tion having a few large clusters. Thus cluster level can be 
thought of as a measure of the largest dissimilarity (or, 
equivalently, the weakest bond) present within any cluster in 
Chem partationms 

For caaample, Consa.cger clUusver Level 0.0, the minimum 
observed cluster level in Drawing 2. At cluster level 0.0, 
the 85 countries are partitioned into 77 clusters. Seventy- 
mae of these 77 contain only a single country. Four of the 77 
G@@nvain exactly two countries. And one cluster contains 5 
countries: Canada, Ireland, Switzerland, Sweden and Denmark. 

Y eutrice 0.0 is the minimum observed cluster level, we may conclude 
that the strongest possible bonds exist within every cluster. 
specifically, we may conclude that Canada, Ireland, Switzerland, 
Sweden and Denmark are bound together by the tightest possible 
economic ties. Our mathematical model will not separate them 
even at the lowest cluster level. 

Consider next a slightly higher cluster level, say 1.3. 
Here we are SeEMiit tine slightly weaker bonds to be present 
within clusters. We find that the 85 countries are here 
parvrrirenee anton >> clusters ” Thirty-three of those* clusters 
contain a single country, twelve contain exactly two countries, 
five contain exactly three countries, one contains four 
conmmilliee and one contains nine countries. In the nine-country 
cluster, Canada, Ireland, Switzerland, Sweden and Denmark have 


been joined by New Zealand, South Africa, France and Australia. 


Wy 





We may conclude that slightly weaker economic ties bind the 
four new countries to the origamal five. 
Similar inferences can be drawn from any dendrogram 


preduced by Cluster Analysis. 
wes 


a 


7. Work in Progress 5 wee 4 


[hrovusgheubegars seconde demonsimeation,.of Cluster 
Analysis it has been emphasized that the unavailability of 
pair-wise trade data made the demonstration artificial. But 
this artificiality can soon be removed. Pair-wise trade data “Fa 


OC4AL, 


nr7 


was recently provided to this author through the Inter- he 
University Consortium for Political Research [Ref. 9]. Time . 
will not pérmit this author to complete a Cluster Analysis | 
ome one data g@ipul af another resecareherm chooses to undertake / 
it, the following plan of attack is suggested. / 
a. Step 1 -— Reduce data file to manageable size a 

ive P@rR data filepcontaims seapproxiamately 333,720 
pairwise trade data: annual trade values, in millions of 
U.S. dollars, for the years 1958 through 1968, among 207 differ- 
ent "countries." Many of these "countries" are actually 
colonies and many Suen have negligible foreign trade except 
with a single "sponsor country." The logical first step is 
to selectively reduce the size of the data file by eliminating 
the insignificant "countries", and by selecting a single year 
and eliminating the other nine. It is recommended that all 
"countries" be eliminated except the 136 nations having a - 


population of one million or more and those smaller nations 


having membership in the United Nations as of 1968. These 


42 





136 nations are listed on pages 1 through 4 of Ref. 8. It is 
further recommended that the year 1967 be used and the rest 
be eliminated temperarily. The author has determined that, 
through the first one-sixth of the file, 1967 has fewer zero 
entries than any other year (a zero entry signifies either 
trade less than 100,000 dollars or missing data). This 
selective reduction of the data should reduce the file length 
Be about one eighth its original length. 
b. Step 2 - Sort and combine data 

Preparatory to sorting the data, the reduced data 
file should be stored on either a disk or a data cell rather 
than magnetic tape. The ICPR normally provides the data on 
bape, and tape 1Sea satisfactory input to the data reduction 
process in step 1 because that process can be sequential, 
reading the fale once from beginning to end. However, the 
sorting proc@ss about to be déscribed cannot réad the file 
sequentially, and magnetic tape is a very inefficient THpuUG 
TOeproceSegeechavemust search the data. 

ahe ICPR data file does not list one trade figure 
Per cCOUNnCryeemecdeper year. it lists up to four figures, 
namely, | 


1. Value of exports from CO Miomeemceperved by i 


al 
2 Value of exports from i to j, as reported by j 
3. Valweeo. exports from jJ to i, as reported by j 
4 j 


Value of exports from to i, as reported by i 


Hopefully numbers 1 and 2 are approximately equal and numbers 


3 and 4 are approximately equal. If so, then total trade 


13 





between i and j is the sum of 1 and 3. It is recommended 
that this approximate equality be assumed for the initial run 
of this "sort and combine" process. Then the process is 
Simple: search the file for the first record involving the 
Paved, Ldenbify it With respect tO direction of trade, 
regardiess of reporting country; continue searching for the 
second record involving the i-j dyad; identify it with respect 
to direction; if the directions are opposite then sum the two 
values and store them; if the directions are the same then 
ignore the second value and continue searching for the third 
record; and so on. The reason why shortcuts are in order for 
the initial run is that this "sort and combine" process will 
have to be performed 9180 times (136 countries, taken two at 
a time, yields 9180 different combinations). 
ec. Step 3 - Choose a Dissimilarity Coefficient 
The following formula is recommended as a DC, at 


least initially: 


i 


Vet ieade... 
1J 


DC(i,j) = 


More elaborate formulas can be developed later by incorporating 
the rationale in Section II.B.4 of this thesis. 
d. Step 4 - Choose a Clustering Algorithm 
It is recommended that an agglomerative algorithm 
yen Croan Pyerege sorting strategy be used, for the same 
reasons that it was selected in Section II.B.5 above. This 


involves making the following additions and substitutions in 


44 





Gime cOomMpuLer propram Tasted at the end of this thesis: 


immediately Defore = DOV/Geb-=i5N insert the following two 
statements: 


Ramee = Sere 7 GSA, A) +S(3B.B) ) 

RATB = S(B,B)/(S(A,A)+S(B,B)) 
In place of 

DS(E) = 


SAC ICSGE 2) , SCE ,'B) ) 
substitute 

Poe = ReA*S(h.A) + RATB*S(E,B) 
peariy, ine place of 

TORS CE) 


AMAX1(S(A,E),S(B,E)) 
Substitute 

70 DS(E) = RATA®S(A,E) + RATB*S(B,E) 
And finally, in place of 

7a) DSIGE) = AMAX1(9(E,*) .S(BSB) 


substitute 
fA Sep) 


RATA*S(E,A) + RATB*S(B,E) 


45 





LIT. CONCLUSION 


This thesis has demonstrated two metentsaal uses of Cluster 
Analysis in which the nations of the world are treated as 
measurable objects. The substantive results obtained in each 
demonstration are not presented as percaetions: they were 
derived incidentally while demonstrating methods. It is 
asserted that the two uses illustrated here, markedly differ- 
ent in several respects, are representative of a wide range of 
applications for Cluster Analysis in the fields of political 
science and international relations. Although Cluster Analysis 
was developed for the physical sciences and has so far received 
scant attention outside that context, it is readily adaptable 
wo the social sciences. in particular, it is extremely well 
Smiucted to model building and statistical analysis involving 
the nations of the world. As such, it warrants the attention 


of the U.S. State Department. 


46 





APPENDIX A 


DATA 


Except for three data points, all data used in this thesis 
were made available by the Inter-University COnsOorvium rer 
Political Research. The data were originally collected by 
Charles Lewis Taylor and Michael C. Hudson. Neither the 
original collectors of the data nor the consortium bear any 
responsibility for the analysis or interpretations presented 
here. 

Following are the precise definitions of the nine vari- 
ables used in this thesis. All definitions are extracted 


merbetimefrom Ret. &. 


Variable name: Concentration of Population in Cities, 1965 
Definition: Concentration is defined as: the sum over all 
cities of the squares of the Bec peiior of the total popula- 
von resiGiweeeiieceacn City. Concentration is higher the rewer 
cities and the greater the size of the largest city relative 


to the vovelmpepulation. [Ref. 8, p. 16] 


Variable name: Radios per 1000 Population, 1965 

Definition: Figures relate to all types of receivers including 
those connected to a re-distribution system. They relate 
either to the number of licenses issued or sets declared or 

to the estimated number of receivers in use. In many countries 
a license may cover more than one receiver in the same house- 


hole. “Saremexclude television sets. [Ref. 8, p. 32] 


47 





Variable name: Students in Higher Education (Third Level) 

per One Million Population, 1965. 

VeeretOn:; Datarreier FO thesenkoliment_in all institutions 
Se soducation aueehe thimd level, &.ec., degree granting and 
non-degree granting institutions of both private and public 
henner ,educationmol a1 types. These include universities, 
baeeher Lechnical~schools, teacher trainang schools, theological 
schools, etc. As far as possible part time students are 
included in the figures but correspondence courses and auditors 


are generally excluded. [Ref. 8, p. 41} | 


Variable name: Ethno-Linguistic Fractionalization 

Definition: The main source for this variable (Atlas Narodov 
Mira) makes little distinction between ethnic and linguistic 
differences in its definition and collection of data. Groups 
are determine@ meu by their physical characteristics but by 
Bier srolesseGhesmadescentsyand their relationships toe others. 
An index of fractionalization calculated upon data from Atlas 
does correlate highly with a similar index calculated upon 
linguistic data from other sources, but not quite highly 
enough to be considered the same indicator. Other sources 
used here report only linguistic data. Index of fractionaliza- 


tion was calculated by the following formula: 


F 1 (N subi / N) (N subi — 1/N-1) 


where N subi = number of people in the ith group 


and N = total population [Ref. 8, p. 46] 


48 





Variable name: Press Freedom Index, 1965 

Por rTOn. fils andem,ecreavead by the School of Journalisn, 
University of Missouri, is "designed to measure the indepen- 
dence of a nation's broadcasting and press system and its 
ability to criticize its own local and national governments." 
The index is comprised of the judgements of panels of native 
and foreign newsmen on 23 aspects of the press (e.g., extent 
of legal controls, licensing, government ownership, criticism 
and censorship). For a fuller description, see Ralph L. 
Lowenstein, "PICA (Press Independence and Critical Ability) 
Index: Measuring World Press Freedom," University of Missouri, 
school of Journalism Freedom of Information Center Publication 
#166 (August, 1966). The index, which consists of averages 

of the judges' scores, has a range from -4.00 for less freedom 


@o +4.00 for more. [Ref. 8, p. 116] . 


Variable name: Gross National Product per Capita, 1965 
Definition: This variable was derived by dividing Gross 
Pational Product i millions of U.S. dollars ™by total popula- 
tion in thousands. Gross National Product is reported in 
COnsvanu Uso mcel Mars and refers to gross national product 
even for countries which normally report their national 
acCOUnTSeimegerms Of net Material product or other concepts. 


[Rer. & ,2pene> | 


Variable name: Trade as percentage of Gross National Product, 
1965. 


Definition: This variable was derived by dividing total trade 


a9 





(imports plus exports, merchandise only) by Gross National 


Preauct. Ref. ).p."69% 


Variable name: Soviet Aid per Capita, 1954 - 1965 
Definition: This variable was derived by dividing total 
Soviet aid by total population. Total Soviet aid data refer 
to Soviet economic prackiee and grants to countries in terms 
of thousand U.S. dollars for the period 1954/5 - 1965. 


imer. 85 p. Ov) 


Weriable name: U.S. Economie Aid per Capita, 1958 - 1965 
Definition: This variable was derived by dividing total 
tee CCONOMMCmara, DY vOlal woouullation. Total U.S. economic 
aid data refer to grants and loans and are given in millions 
Of U.S. dollars for the period July 1, 1958 through June 30, 


ees. PRef. (6p 107) 


The three data points not provided by the ICPR are listed 
below. The ICPR data file listed all three as missing data. 
But in each case this author preferred to introduce an approxi- 
mate (or even erroneous) value rather than eliminate the 
particular country from the Cluster Analysis. Hence the three 
values were estimated in the manner specified. Note that no 
two estimations involved the same country. All countries 
missing two or more data (among the nine variables used) in 
the ICPR data file were omitted from the Cluster Analysis at 


the outset. 


5) 





Counery: Chile 

Variable name: Radios per 1000 Population, 1965 
Estimated value: 240.0 

Method of estimation: Average of values for Peru and 


mecenctina. 


wountcry: Chad 

Variable: Students in Higher Education (Third Level) per 

One Million Population, 1965 

Estimated value: 230.0 

Method of estimation: Average of values for Mali, Upper Volta, 


Sudan and Cameroon. 


Country: dzambia 

Variable: Students in Higher Education (Third Level) per 
One Million Population, 1965 

Mstimated weilue: i1/0.0 

Method of estimation: Average of seventeen neighboring 


countries. 


Syl 





COMPUTER PROGRAM 


Clustering by Agglomerative Algorithm with Complete Link Sorting Strategy 


NTIES 


S$(85,86), 


Ey Hy 


(A,B,E) 
By 


rt OE 


leds SU1 53) 1S eile ist SOlsd), 1d othe 


x 

4 

& 

Ww —~ + 

&) st e 

O iy =O’ 

UO | | 

a ui se) © 

Za) ee NS) 

Oo 3 = 

om 4 (_) a ~ 

O Oe*ed a N 

Pr Fo a2¢ 26022 0 
~ «© te) Om CE ow we 
ratte AO aHN I ATW 


WoW COU I mS Pe LL ee 
=O ODS TORN ~DNe WL 
at Oeste Oe OY TOO YT OE 


SWI SIUVUINI BRE IDN BR TOS er OO OEE 


0.2.2.0 


ee ee ee 


= 0 Il OO~O00~0 OWOwK OU OO 
—OIZ OOMOUOMNVQOAULMNOG® IL © 


owt 
On N oO am 
LUV UN wy oO fun © 
© © 
aac om 


De 


ae 


Se) ae 
bab 
b 
a0. 
Oc 
6 
ee WO 
<-_ - 
— tw 
© 
OW & 
>< 
Zorn 
Oat 
4 4 fp 
Mp 
SS 
el 
ua © 
CIO 
ae | 
YW) eT 
m4 COO) 
Fare 
ti4D 
WA 
e. 
NO & 
VY) | >< 
Us U4) 
we ew 
Ore 
MO<{ 
<0 oe 
rm 
0 
— 0 <I 
bk nar 
Ik 
=< > 
bo = Lu 
bt b= OD eG) 
O* =) © 
cc W-= 
aoe GC) IN 
—J J} Zr 
<I ~ o 
Odto~= “om OM 
OCD Our LY Or) 
~ —O xO 
IN=war & HO 
ODxetaTeHt Yet 
ee oS Ip a mh 
OO be <0 79 I Pe er LW 
LWW CD) | — oe 
ee eM PW Ke 
Z2z WOOWS 
wet er er JIUY ORR EI Oe 
ad lf} Omar ZO 
LiLo avoor 
mrt OD MSL OW 
wt LV UY 
LY OU 
: © 
a 
OUWOUUOUW 


OOOU 





- COMPUTER PROGRAM, continued 


MATRIX FOR MIN 


SEARCH 


OU 


tN YY) at 


N 

Ne) 

e 

weal 

Ne) 
ene 

© 

Se) 

out ow 

p> 

a— wtf 

e Poem 

co = 4 
om || | hem 
att 
~ | =v 
NOH NY re 
~~ 0 Il Oww 


© 
0 


0 oT BP Ae ee ly | 
VUnnODtOonwy dt be OrOoUd 
TFOLOK Oe LOagZoZOVUNO 


et OF CO 
00 O 


OOO OUO OU 


53 


eet 
ht 


ae 
a 


OO 
Oo 


Pe tom 


<I © 


HW Ute 
US J) 

e 6 
or) LU LU 


tue 


Coe ie 


oe 


OUOUO 


OPT ION 
OPTION 
OPTIQN 


AMAXL(S(E:A) »S(E2B)) 
AMAXL(S(A,E) 2S (082 E)) 
AMAX1L(S(E,A) 2S (By E)) 


mm tw 
mth Wo 
=< 


OOOO 





continued 


COMPUTER PROGRAM, 


PEUG DS: tNTes 


OO 


aD 
mn 
& 
—_ oO ° ~ 
O —;- co 
e: YY) o 
N -~O -— oOo =~ = 
12 8] am) Ls — cous =< Lid 
oom ~ <I w 2. Ow a = 
aoul ae en ae an WY WN 
ee Oh ee? a nw *~CO Cc) CQ G 


et Y) Y ~~ HY) wa 
Nm ONOWNGA II Us 7 tj on @Oll W 
wat HONnONUWIOZuUaM od DOD D 
NWO toate OP mr |) OL OOOMK 
CO we & he mE mee OL! CO? @h~ She oh 
I Wea — Zot ca) ch. 
OUWw~O~ OK LON OZOU ~O~O~O 
A™MNMONOIN SOOOZORNONOWNY 


O An + mW Oo -»® 
co wma co co ao wma 


2 


SHUrPFLE COUNTRY CODES 


OO 


os 
oO 
eg 
I~ 
© 
O 
N as) 
@ay e 
Cy 
© WwW 
h~- e 
oe, - 
OC e 
©) co 
[as 
~ ° 
al r-d 
e om 
© o> ~ 
0 oo = 
OO} tc CO oe 
#9) ee) oz 
— 0 eof e =~ 
O- -— © ayn 
<7 eee | on as 
+ i] etlied ~ tH ft m+ UW i tw 
I VT I [a @ a Goa On al eae Ba 
a RP an Ce OP I | ~~ 
| ot /) W970 NW UND ee 
Der HOw BMI Fe Oe 
ond <_ wo <<aTaQ) iT aja~— MZ 


aAOuodlw~—~a OaOowO 
TOMO IO ZOMNMO 


Ov aN «4 tr Ta) 
co On Oo oO O 


O 


So) 
Sed 
=) 
pie 
©) 
wt 
mY 
Pa (ed 
li = CO 
OD} atk 
je 
On ato 
Prat Pak We NES, 





COMPUTER PROGRAM, continued 


SUBROUTINE RITMAT(NPRINT) 


~ 
oO 
Y) ~ 
Lu Y) 
— = 
— e 
Pal b- ~ 
oc - sO 
~ << Ge Oo 
om 0] o 
eo) Y bh — 
c a) VY) — - 
a i ~ Y= ~ 
wy a ~ Ne 9 ~ 
(oe) e b= ~ @ e 
— =~ Z =~ Or 
WY =) <I UV iL. Ne) 
. & ec co * © 
~ wa WY o~.+ 
<a eat ae — © 
Ym it — OV 
~ ws 7) MLL 
Li e oa e 
oO D ~x< 
oa i —41 ped 
Oo too - on 
~~ HO ast oN — =O 
on em) et Oe fen COD Owe 


IN NK Ne Ne Ne 
Oh + Om OO Om Oa 
nt & at ne mel —sOO eall\ 
Om emt f— @(T) te Le) oo | 
ase Oo bee hee jet fi 
OZiUtrea amoe Ze ze ae 
Zee ake ae — wo Ze — ow 
atc Y or (ea ~Or ‘ad 
cOle > oH Qe OL os Oe 
e2aueeY eZ wl eS I er Le 
Zz ke coal SE ood eee ee 
OW <P OL ALU ee tL <T COU <r Li <f 
ahmaNON ES Sk Rake NR SRR SD 
Seo enHnorv OOrOWIOOcrOoecoawZz 
Oo Stet) VO sSiLOSLUOUOSLlosiawu 


uy ON MT coc dO 
N NN NN NNN 
© O OQ © © 
wt noel oo - nl 


> 











COMPUTER PROGRAM, continued 


CLUSTERS INTO ROWS 13,979 


SUBROUTINE RITMINCONPRINT) 


vf 
e 
VY) Ov VY) e 
WwW oa lu ~ 
Lcd ae — fe & 
t= = Ocul 
Pa Wo aZO) RN 
a Orme &N, 
o rt = eC) ee ow 
> ian aoeae 
OO Cyr 
D =e ozow. & 
- ui * eer eo 
uw ms Tan] mWom OO 
© LLU ts CO ee &p- f= 
—) o~ we (2) I) 5Z 
” <* YIU 
pa te? O> oe 
= cv) eZ 
ue ol se be oe) 
VY) <I Can amd <I & 
= WwW = > ed e ewe (oe 
ei Pe a Zz wmWO WS 
2 and mm W et>O 
eezrmZz nn Oo ce we SOw 
fenyee) UL) et @ coq) Ost. 
TI eHnOdqy Ire —w <I «OO we YD) 
we IO et et ee om <b || ay wae ef 
~e ODOCOM2A <4 aCe “Oui 
WI eda NOY) © QO W ewe 
OO eeeane OW he WO ee oS 
rFOetcZaZ il t~ wo FOZ =e 
ZZ TLS rt Pe BS 
ln 4-4-4-4>) Ww = cocw 
eA AAO « & @ = ee « © 
cel lg la cl A etal Se ae 
-— ed — s 
VE PE a 5 Oz ees 
SKOQWWUIITAIaae O RmOUWNOR 
See = > = SS Cs ee 
OS ete ter or a fae o> Hye Ik.) 
SOAK LCCLCOOOOwZ > SOc e Tepe 
od 
DOAN = 
OQ tet ws 
OOOO © 
al ew) eal el : ir 
OO YO 


56 





LIST OF REFERENCES 


bol CoO, 'A Clustering Technique 
for Summarizing Multivariate Data," Behavioral Science, 
ey. 12) p. Poeesbea067. oa 


RheGgece ae eetoimeenos an instrument for Decision-= 


Making HOremementtewrs: Diagnosis and System Design, 
MS Thesis, Massachusetts Institute. of Technology, 1970. 


neti ieee roi Ollcy Objectives: A Systematic 


Approach to Management, Evaluation, Coordination and 
Resource Allocation, paper presented at the 25th Military 


Operations Research Society, Intelligence Working Group, 
New London, Connecticut, 17 June 1970. 


Cormack, R. M., "A Review of Classification," JOUrhe om 
the Royal otacisetical Soctety, v. 134, p. 321-353, Part 3, 
ere 


pneavun, Easter weana oOkal, R. R., Principles of Numerical 
Taxonomy, Freeman, 1963. 


Jardine, N. and Sibson, R., Mathematical Taxonomy, Wiley 


se lle 


Carison, P. E. and Hancock, W. J., Law of the Sea: An 
Application of Cluster Analysis, MS Thesis, Naval Post- 
graduate School, 1972. 

tay lor, Geta and Hudson, M. C., World Handbook of 
Pouce icam—e- sem ocial Indicators II, Section I, Cross- 
National Aggregate Data. Yale University, 1970. 


Gillespie, J., and Zinnes, D., World Trade Data: 1958 - 
1968, Inter-University Consortium for Political Research, 


oe 


Dil 





ieee tol nt BUTTON LIST 
No. Copies 


Defense Documentation Center 2 
Cameron Starvimon 
Alexandria, Virginia 22314 


Library, Code 0212 2 
Naval Postgraduate School 
Monterey, California 93940 


Mest. Protessor B. O. Shubert ch 
Department of Operations Research 
and Administrative Sciences 
Naval Postgraduate School 
Monterey, California 93940 


Asst. Professor E. J. Laurance a 
Department of Government and Humanities 

Naval Postgraduate School 

Monterey, California 93940 


LCDR James Richard Lamping, USN 1 
15 La Playa 
Monterey, California 93940 


Professor R. von Pagenhardt al 
Naval Management Systems Center 

Naval Postgraduate School 

Monterey, California 93940 


Mr. Dan Daniels i i 
M/MS 

Reem 2231 

Department of State 

Washimgtem. D. C. 20520 


Inter-Umeagersity Consortium for Political a 
Research 

Box 1240 

Ann Arbor, Michigan 48106 


Naval Postgraduate School i) 
Department of Operations Research 

and Administrative Sciences 
Monterey, California 93940 


Chief of Naval Personnel is 
Perse Jails 

Department of the Navy 

Wachimeren, D. C. 20370 


58 





Secunty Classification . a s . —_ = 
DOCUMENT CONTROL DATA-R & D 
(Security classificetion of title, body of ebstract and indexing annotation must be entered when the overall report is classified) 
2a. REPORT SECURITY CLASSIFICATION 


Unclassified 


2b, GROUP 


Applications of Cluster Analysis to Some Problems of Interest to 
the U. S. Department of State 


eas. 1 







ORIGINATING ACTIVITY (Corporate author) 


Naval Postgraduate School 
Monterey, California 93940 


REPORT TITLE 


. DESCRIPTIVE NOTES (Type of report and, inciusive dates) 


Master's Thesis: September 19 


aS 
. AUTHORIS) (First name, middie initial, feet name) 


James Richard Lamping 


- REPORT OATE Ja. TOTAL NO. OF PAGES 7b. NO. OF REFS 
e @ 
Seotember 8 60 


ja. CONTRACT OR GRANT NO. Sa. ORIGINATOR’S REPORT NUMBER(S) 







b. PROJECT NO. 


c. 9b. OTHER REPORT NO(S) (Any other numbere that may be assigned 
thie report) 


10. OISTRIBUTION STATEMENT 


Approved for public release; distribution Tle emMaee eel . 





12. SPONSORING MILITARY ACTIVITY 


Naval Postgraduate School 
Monterey, California 93940 





tt. SUPPLEMENTARY NOTES 


[3. ABSTRACT 


This thesis asserts that Cluster Analysis, or Numerical Taxonomy, 
Mes many potential applications in the field of international relations, 
rt demonstrates two representative applicatwens., Bown examples treay | 
the nations of the world as objects having measurable attributes, and 
both examples use selected attributes to produce a dendrogram (or 
hierarchical classification) of the nations of the world. In one 
example this dendrogram is used to objectively group the nations into 
blocs based on external economic ties. In the other example the 
dendrogram is used to highlight interactions among five avigmmp Ue. , 
ignoring the identity of individual nations, the same way a scatter 
plot highlights interactions between two variables. 


eye ie ath ACE!) 
S/N 0101-807-6811 | 59 ar r——Security Classification |. 


A~31408 





SS mm oe 


aE 
Security Classification 


KEY wORDS 


Cluster Analysis 


Dissimilarity Coc temeslent 


Dendrogram 


y FoR TA 73 (Back) 


0101-807-6821 














Thesis 1 
12556 Lamping 4 9424 
ed Applications of clus- 
ter analysis to some 
problems of interest to 
the U, S, Department of 
State. 


, . DUDLEY KNOX LIBRARY 


thesL2556 
Applications of cluster analysis to some 





